Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and its Automatic Evaluation

نویسندگان

Kuang-hua Chen

Hsin-Hsi Chen

چکیده

phrases. The partial parser is motivated by an intuition (Abney, 1991): To acquire noun phrases from running texts is useful for many applications, such as word grouping, terminology indexing, etc. The reported literatures adopt pure probabilistic approach, or pure rule-based noun phrases grammar to tackle this problem. In this paper, we apply a probabilistic chunker to deciding the implicit boundaries of constituents and utilize the linguistic knowledge to extract the noun phrases by a finite state mechanism. The test texts are SUSANNE Corpus and the results are evaluated by comparing the parse field of SUSANNE Corpus automatically. The results of this preliminary experiment are encouraging. (1) When we read a sentence, we read it chunk by chunk. Abney uses two level grammar rules to implement the parser through pure LR parsing technique. The first level grammar rule takes care of the chunking process. The second level grammar rule tackles the attachment problems among chunks. Historically, our statisticsbased partial parser is called chunker. The chunker receives tagged texts and outputs a linear chunk sequences. We assign a syntactic head and a semantic head to each chunk. Then, we extract the plausible maximal noun phrases according to the information of syntactic head and semantic head, and a finite state mechanism with only 8 states.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic titling of Articles Using Position and Statistical Information

This paper describes a system facilitating information retrieval in a set of textual documents by tackling the automatic titling and subtitling issue. Automatic titling here consists in extracting relevant noun phrases from texts as candidate titles. An original approach combining statistical criteria and noun phrases positions in the text helps collecting relevant titles and subtitles. So, the...

متن کامل

Recherche documentaire par titrage automatique

In this paper, we propose a system in order to facilitate the information retrieval in a set of textual documents. Our approach is based on the automatic titling (and subtitling). This last one is crucial, for example, for the issue of web pages accessibility (W3C standard). Our process of automatic titling consists in extracting relevant noun phrases from texts. These ones can represent a titl...

متن کامل

Semi-Automatic Recognition of Noun Modifier Relationships

Semantic relationships among words and phrases are often marked by explicit syntactic or lexical clues that help recognize such relationships in texts. Within complex nominals, however, few overt clues are available. Systems that analyze such nominals must compensate for the lack of surface clues with other information. One way is to load the system with lexical semantics for nouns or adjective...

متن کامل

Surface Grammatical Analysis For The Extraction Of Terminological Noun Phrases

LEXTER is a software package for extracting terminology. A corpus of French language texts on any subject field is fed in, and LEXTER produces a list of likely terminological units to be submitted to an expert to be validated. To identify the terminological units, LEXTER takes their form into account and proceeds in two main stages : analysis, parsing. In the first stage, LEXTER uses a base of ...

متن کامل

Terminology extraction from medical texts in Polish

BACKGROUND Hospital documents contain free text describing the most important facts relating to patients and their illnesses. These documents are written in specific language containing medical terminology related to hospital treatment. Their automatic processing can help in verifying the consistency of hospital documentation and obtaining statistical data. To perform this task we need informat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1994

Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and its Automatic Evaluation

نویسندگان

چکیده

منابع مشابه

Automatic titling of Articles Using Position and Statistical Information

Recherche documentaire par titrage automatique

Semi-Automatic Recognition of Noun Modifier Relationships

Surface Grammatical Analysis For The Extraction Of Terminological Noun Phrases

Terminology extraction from medical texts in Polish

عنوان ژورنال:

اشتراک گذاری